The dataset is comprised of 81 variables and contains 113937 entries. The variable that are explored in the dataset are the following Term : Amount of month customers opted for loan
LoanStatus : Current status of the loan like chargedoff, completed, defauted etc…
EstimatedEffectiveYield : Yield of lenders from borrowers minus the processing fee and late fines
ProsperScore : Risk Factor score from 1 to 10. 10 being least risky
BorrowerAPR : The Borrower’s Annual Percentage Rate (APR) for the loan.
BorrowerRate : The Borrower’s interest rate for this loan.
ListingCategory..numeric. : Prosper rating for borrowers in numbers
EmploymentStatus : Current type of employment
Occupation : Occupation of borrower at the time of listing
EmploymentStatusDuration : How long the employee has been employed
IsBorrowerHomeowner : Does the borrower owns house at the time of listing (True & False)
ProsperRating..Alpha. : Prosper rating for borrowers in alphabets
IncomeVerifiable : If the income of the borrower is verifiable at the time of listing (True & False)
StatedMonthlyIncome : Monthly income of the borrower
MonthlyLoanPayment : Monthly loan payment amount
Recommendations : Recommendations the borrowers has at the time of listing
DebtToIncomeRatio : The debt to income ratio of the borrower at the time the credit profile was pulled.
LoanOriginalAmount : Original amount of the loan
LoanOriginationQuarter : Quarter of the month when loan was originated
A basic exploration of the datset would give the following information
## ListingKey ListingNumber
## 17A93590655669644DB4C06: 6 Min. : 4
## 349D3587495831350F0F648: 4 1st Qu.: 400919
## 47C1359638497431975670B: 4 Median : 600554
## 8474358854651984137201C: 4 Mean : 627886
## DE8535960513435199406CE: 4 3rd Qu.: 892634
## 04C13599434217079754AEE: 3 Max. :1255725
## (Other) :113912
## ListingCreationDate CreditGrade Term
## 2013-10-02 17:20:16.550000000: 6 :84984 Min. :12.00
## 2013-08-28 20:31:41.107000000: 4 C : 5649 1st Qu.:36.00
## 2013-09-08 09:27:44.853000000: 4 D : 5153 Median :36.00
## 2013-12-06 05:43:13.830000000: 4 B : 4389 Mean :40.83
## 2013-12-06 11:44:58.283000000: 4 AA : 3509 3rd Qu.:36.00
## 2013-08-21 07:25:22.360000000: 3 HR : 3508 Max. :60.00
## (Other) :113912 (Other): 6745
## LoanStatus ClosedDate
## Current :56576 :58848
## Completed :38074 2014-03-04 00:00:00: 105
## Chargedoff :11992 2014-02-19 00:00:00: 100
## Defaulted : 5018 2014-02-11 00:00:00: 92
## Past Due (1-15 days) : 806 2012-10-30 00:00:00: 81
## Past Due (31-60 days): 363 2013-02-26 00:00:00: 78
## (Other) : 1108 (Other) :54633
## BorrowerAPR BorrowerRate LenderYield
## Min. :0.00653 Min. :0.0000 Min. :-0.0100
## 1st Qu.:0.15629 1st Qu.:0.1340 1st Qu.: 0.1242
## Median :0.20976 Median :0.1840 Median : 0.1730
## Mean :0.21883 Mean :0.1928 Mean : 0.1827
## 3rd Qu.:0.28381 3rd Qu.:0.2500 3rd Qu.: 0.2400
## Max. :0.51229 Max. :0.4975 Max. : 0.4925
## NA's :25
## EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## Min. :-0.183 Min. :0.005 Min. :-0.183
## 1st Qu.: 0.116 1st Qu.:0.042 1st Qu.: 0.074
## Median : 0.162 Median :0.072 Median : 0.092
## Mean : 0.169 Mean :0.080 Mean : 0.096
## 3rd Qu.: 0.224 3rd Qu.:0.112 3rd Qu.: 0.117
## Max. : 0.320 Max. :0.366 Max. : 0.284
## NA's :29084 NA's :29084 NA's :29084
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## Min. :1.000 :29084 Min. : 1.00
## 1st Qu.:3.000 C :18345 1st Qu.: 4.00
## Median :4.000 B :15581 Median : 6.00
## Mean :4.072 A :14551 Mean : 5.95
## 3rd Qu.:5.000 D :14274 3rd Qu.: 8.00
## Max. :7.000 E : 9795 Max. :11.00
## NA's :29084 (Other):12307 NA's :29084
## ListingCategory..numeric. BorrowerState
## Min. : 0.000 CA :14717
## 1st Qu.: 1.000 TX : 6842
## Median : 1.000 NY : 6729
## Mean : 2.774 FL : 6720
## 3rd Qu.: 3.000 IL : 5921
## Max. :20.000 : 5515
## (Other):67493
## Occupation EmploymentStatus
## Other :28617 Employed :67322
## Professional :13628 Full-time :26355
## Computer Programmer : 4478 Self-employed: 6134
## Executive : 4311 Not available: 5347
## Teacher : 3759 Other : 3806
## Administrative Assistant: 3688 : 2255
## (Other) :55456 (Other) : 2718
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## Min. : 0.00 False:56459 False:101218
## 1st Qu.: 26.00 True :57478 True : 12719
## Median : 67.00
## Mean : 96.07
## 3rd Qu.:137.00
## Max. :755.00
## NA's :7625
## GroupKey DateCreditPulled
## :100596 2013-12-23 09:38:12: 6
## 783C3371218786870A73D20: 1140 2013-11-21 09:09:41: 4
## 3D4D3366260257624AB272D: 916 2013-12-06 05:43:16: 4
## 6A3B336601725506917317E: 698 2014-01-14 20:17:49: 4
## FEF83377364176536637E50: 611 2014-02-09 12:14:41: 4
## C9643379247860156A00EC0: 342 2013-09-27 22:04:54: 3
## (Other) : 9634 (Other) :113912
## CreditScoreRangeLower CreditScoreRangeUpper
## Min. : 0.0 Min. : 19.0
## 1st Qu.:660.0 1st Qu.:679.0
## Median :680.0 Median :699.0
## Mean :685.6 Mean :704.6
## 3rd Qu.:720.0 3rd Qu.:739.0
## Max. :880.0 Max. :899.0
## NA's :591 NA's :591
## FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
## : 697 Min. : 0.00 Min. : 0.00
## 1993-12-01 00:00:00: 185 1st Qu.: 7.00 1st Qu.: 6.00
## 1994-11-01 00:00:00: 178 Median :10.00 Median : 9.00
## 1995-11-01 00:00:00: 168 Mean :10.32 Mean : 9.26
## 1990-04-01 00:00:00: 161 3rd Qu.:13.00 3rd Qu.:12.00
## 1995-03-01 00:00:00: 159 Max. :59.00 Max. :54.00
## (Other) :112389 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## Min. : 0.000 $25,000-49,999:32192 False: 8669
## 1st Qu.: 0.140 $50,000-74,999:31050 True :105268
## Median : 0.220 $100,000+ :17337
## Mean : 0.276 $75,000-99,999:16916
## 3rd Qu.: 0.320 Not displayed : 7741
## Max. :10.010 $1-24,999 : 7274
## NA's :8554 (Other) : 1427
## StatedMonthlyIncome LoanKey TotalProsperLoans
## Min. : 0 CB1B37030986463208432A1: 6 Min. :0.00
## 1st Qu.: 3200 2DEE3698211017519D7333F: 4 1st Qu.:1.00
## Median : 4667 9F4B37043517554537C364C: 4 Median :1.00
## Mean : 5608 D895370150591392337ED6D: 4 Mean :1.42
## 3rd Qu.: 6825 E6FB37073953690388BC56D: 4 3rd Qu.:2.00
## Max. :1750003 0D8F37036734373301ED419: 3 Max. :8.00
## (Other) :113912 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## Min. : 1000 2014-01-22 00:00:00: 491 Q4 2013:14450
## 1st Qu.: 4000 2013-11-13 00:00:00: 490 Q1 2014:12172
## Median : 6500 2014-02-19 00:00:00: 439 Q3 2013: 9180
## Mean : 8337 2013-10-16 00:00:00: 434 Q2 2013: 7099
## 3rd Qu.:12000 2014-01-28 00:00:00: 339 Q3 2012: 5632
## Max. :35000 2013-09-24 00:00:00: 316 Q2 2012: 5061
## (Other) :111428 (Other):60343
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 63CA34120866140639431C9: 9 Min. : 0.0 Min. : -2.35
## 16083364744933457E57FB9: 8 1st Qu.: 131.6 1st Qu.: 1005.76
## 3A2F3380477699707C81385: 8 Median : 217.7 Median : 2583.83
## 4D9C3403302047712AD0CDD: 8 Mean : 272.5 Mean : 4183.08
## 739C338135235294782AE75: 8 3rd Qu.: 371.6 3rd Qu.: 5548.40
## 7E1733653050264822FAA3D: 8 Max. :2251.5 Max. :40702.39
## (Other) :113888
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
The prosper loan data can allow me to explore:
Basic Loan Information
By looking at the frequent plot, the majority of borrower APR rate (nearly 10,000) is around 0.17. The second large borrower APR rate is around 0.2. Most of the borrower APR rate is in the range of 0.12 to 0.35.
From the histogram, over 75000 people choose the loan term in 36 years, much higher than 60 years. However, we can see not many people choose the year of term in 12 years.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 36.00 36.00 40.83 36.00 60.00
We can see that people normally lend the loan around 4,000, also 10,000 and 15,000 are also the popular loan amount for borrowers. This frequency plot is positively skewed but appeared small peaks in every 5,000.
Basic Borrower Information
Now let’s dive into basic borrower info!
Over 60,000 borrowers are employed, full-time followed behind as the second. Retired borrowers have the the least numbers.
Prosper has seven loan grades called Prosper Ratings: AA, A, B, C, D, E and HR where AA is the lowest risk down to HR which actually stands for high risk.
Except for empty values, the distribution of the ordinal variable has a bell-like shape. ‘C’ is the most frequent rating in our data and the highest (AA) and the lowest (HR) rating are less common comparing with other ratings in between.
Missing values are surprisingly high. After done the research, I found the Prosper Rating only exists after July 2009. Cross-validated from the loan original date, our data approved my assumption.
We can see from this plot that California has the most borrowers, then it’s Texas, New York and Florida.
From income perspective, we can see most borrower’s income range is $25,000 - 49,999, the second top is the income range $50,000 - 74,999. This can explain that most people who has an income range $25,000 - 49,999 is the new graduate or young professional, they have a very strong buying power to purchase their first home or car.
Further, we look at the anuual income of the borrower, this positive skewed frequency plot shows the result as same as the frequency income range plot and repeatively confirms the strong buying power coming from young professionals.
Credit History
Large proportion of borrowers only occurs one delinquency record. It normally because they forget to pay the annual fee for the credit card, or because they forget to repay one month loan.
By checking the delinquency records last 7 years and public delinquency records last 10 years, it appears that having one delinquency records still the highest. But it’s strange to see few borrowers have more than 30 times delinquencies.
Then how often the borrower use their bank card? We can see the majority bank card utilization frequency is over 0.5 and reach the peak at the 0.9. This can fully demonstrate how much borrower adore the bank card.
From the records of never delinquent trade, we can tell from this negative skewed histogram that most borrowers keep themselves having a good credit rating and value the lend money. Only few of them has a percentage below 0.50 that we can base on this record to select our loan borrowers and decide whether we want to deal with loan repayment with them or not.
We also want to know how much debt burden that our borrower carry. The distribution of this frequency plot shows that the majority of borrower has a debt and income ratio less than 0.40. The ratio reaches the peak at 0.20. The borrowers’s income can fully cover their debts, and it also obey the rule of “ratio of debt and income should 0.85 or 0.8”.
What is the structure of your dataset?
The structure of the dataset covers the different loan interest rates, borrower’s employment status and income, liability ratio, delinquency records, etc.
What is/are the main feature(s) of interest in your dataset?
What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?
Did you create any new variables from existing variables in the dataset?
Created the annual income by calculating the stated monthly income
Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?
I cleaned the income range data, because the not displayed income or null value will affect to see the percentage of each income range. I also excluded several missing values in few variables, such as: DebtToIncomeRatio, BankCardUtilization.
From univariate analysis, I had a big picture about my dataset. In this section, I try to analyse the relationships between variables. My main focus will still on: - Customer Quality - Loan Provider Profitability - Credit Risk
Customer Quality
Let us take a closer look at our borrowers.
Loan amount slightly increased from the end of 2006 to 2007. However, as the subprime mortgage crisis spreaded to the whole nation and worldwide exploded, the loan amount jumped dramatically from 2008 to 2009. Nevertheless, since 2010, the loan amount has an exponential growth. More and more people need cash, and the financial market is very active.
From above plot we can see that people with less income would have heavier mortgage burden. The average debt to income ratio of the $1-24,999 group is slightly higher than $25,000 - 49,999 group. But it’s worth to note that the outliers of the $1- 24,999 group are much more than the others, and it’s strange to see some borrowers have debt to income ratio in 10.0.
Compared mmonthly loan payment and debt to income ration in terms of monthly income, monthly loan payment is heavily concentrated on monthly income range $0 to $ 10000. However, debt to income ratio shows a positive skewed distribution.
This plot can show us in each state which employment status has what prosper rating. The ND and IA have missing value. Self-employed borrower in MA and OR has the worst propser rating (HR) and highest risk. On the contrary, self-employed borrower in AL has the best prosper rating (AA).
From the line plot, as the total trades grow, the average bank card utilization varies from 0.5 to 0.75 then back to 0.6. If the total trades less than 50 times, then the bank card utilization is very efficient, which means the borrower is more likely to use the bank card to do transaction.
We also care about which state has most borrower. The highest number of borrower is in CA, and it demonstrates that the California consumers have some of the higest levels of debt in the country. Moreover, the following Texas, New York and Florida can also be seen as a big loan market.
Income over $100,000 has higher chance to ge the loan over $12,000. The highest loan amount can be over $35,000. It’s interesting to see not employed borrower can get higher loan amount than the borrower income range is between $1-24,999.
Bank Profitability
The interest and fees follow the same pattern as the loan offering amount in the begining but different in the end. Seen from this digram, we can see the interest and fees slightly increased from 2006 Q1 to 2007 Q2, then it slightly decreased utill 2008 Q4. However, unfortunately it dropped to the bottom in 2009 Q2 which has no any interest and fees at all. But it bounced back afterwards, utill 2011 Q2 it achieved the peak. However this thrive did not last long, it kept falling down after 2012 Q2.
In terms of prosper rating HR, the estimated loss and estimated return are in inverse proportion, because high risk credit has high change to loss the money and be hard to pay back.
Credit Risk
The loan provider adopts different strategies towards borrower who has different prosper rating. In general, the borrower APR, borrower rate and lender yield have similar pattern in terms of prosper rating. Obviously, the highest risk borrowers, the loan provider will charge highest borrower interest rate.
The current delinquencies in terms of borrower APR have a large distribution from 20 to 32. The current delinquencies in terms of borrower rate have a large distribution from 15 to 28, even reach to another peak in 35. The current delinquencies in terms of lender yield have a large distribution from 20 to 28, even reach to another peak in 35. However in general, the rate to cause the delienquencies are below 0.2.
There is a large proposion that the curent delinquency below 10 has a debt to income ratio from 0 to 0.2. But it’s strange to see the 100 times rdelinquencies have a low debt to income ratio.
Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?
Borrowers who have lower income would burden heavier debt, according to the debt to income ratio and income range plot.
Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?
It’s interesting to see the loan original amount growed significantly from 2012 to 2014. But on the contrary, the interest and fees went down dramatically from 2012 to 2014. Even they followed the same pattern as loan original amount before 2012.
What was the strongest relationship you found?
The prosper rating and borrower rates have the strongest relationship.
In this Section, I will analyse the client quality, loan provider profitability and credit risk together. In each plot, we can have an overall profile regarding to the three aspects.
As the income increases, the HR prosper rating has the higest borrower APR, in contrary the AA prosper rating has the lowest borrower APR. However, it’s interesting to see borrower rate doesn’t always go up while income increase. For example, borrower APR of the E prosper rating goes down as the income grow. But if the borrower is not employed, the borrower APR is always higher than any other income range.
## # A tibble: 50 x 4
## LoanOriginalAmount ProsperRating..Alpha. mean_BorrowerAPR n
## <int> <fctr> <dbl> <int>
## 1 1000 NA 2445
## 2 1000 A 0.13733354 82
## 3 1000 AA 0.07455752 105
## 4 1000 B 0.18256000 16
## 5 1000 C 0.22367717 173
## 6 1000 D 0.27935232 138
## 7 1000 E 0.34687778 99
## 8 1000 HR 0.37400014 148
## 9 1001 0.20628375 8
## 10 1005 0.34020000 2
## # ... with 40 more rows
The loan amount provided to different prosper rating customer with different borrower rates is similar. Most of loan amount concentrates on $1000 - $25,000.
## # A tibble: 50 x 4
## LoanOriginalAmount BorrowerState mean_LenderYield n
## <int> <fctr> <dbl> <int>
## 1 1000 0.1584107 521
## 2 1000 AK 0.1755500 6
## 3 1000 AL 0.2531619 21
## 4 1000 AR 0.1879650 20
## 5 1000 AZ 0.1815436 55
## 6 1000 CA 0.1874984 251
## 7 1000 CO 0.1821296 54
## 8 1000 CT 0.1660714 21
## 9 1000 DC 0.1987000 3
## 10 1000 DE 0.1467000 8
## # ... with 40 more rows
The super high-interest loans have boomed in CA. We also can spot on few other states have very high interest rates and loan amount, such as FL, OR, NY, DC, MA.
## # A tibble: 50 x 4
## IncomeRange DebtToIncomeRatio mean_CurrentDelinquencies n
## <fctr> <dbl> <dbl> <int>
## 1 $1-24,999 0.02 0.8571429 7
## 2 $1-24,999 0.03 2.2000000 20
## 3 $1-24,999 0.04 1.1351351 37
## 4 $1-24,999 0.05 1.5937500 64
## 5 $1-24,999 0.06 1.7666667 60
## 6 $1-24,999 0.07 1.0416667 72
## 7 $1-24,999 0.08 1.8390805 87
## 8 $1-24,999 0.09 1.0952381 84
## 9 $1-24,999 0.10 1.1372549 102
## 10 $1-24,999 0.11 1.0265487 113
## # ... with 40 more rows
The average current delinquencies are very high amoung the $1-24,999 and $50,000-74,999 income group. The reason behind is these groups have more borrowers and vert high loan amount. But the $100,000+ income group is easier to occur delinquency, because this group uses the loan money to deal with block trade.
## # A tibble: 50 x 4
## ProsperRating..Alpha. EstimatedLoss mean_CurrentDelinquencies n
## <fctr> <dbl> <dbl> <int>
## 1 A 0.0200 0.23536036 888
## 2 A 0.0210 0.06650446 1233
## 3 A 0.0224 0.08321580 709
## 4 A 0.0249 0.16837482 1366
## 5 A 0.0260 0.19602978 403
## 6 A 0.0274 0.11344137 1049
## 7 A 0.0299 0.13573620 1304
## 8 A 0.0324 0.16180150 1199
## 9 A 0.0325 0.09433962 106
## 10 A 0.0330 0.13391557 687
## # ... with 40 more rows
The average current delinquency is very high if the prosper rating is Grade HR, then it’s Grade E. Borrower graded AA has the least chance to delinquent.
Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?
I am glad to find the ralationships between interest rate and borrower who has different income group and prosper rating. It’s meaningful because it gives loan provider some ideas about how to set price of the interest rate in terms of different customer quality.
Were there any interesting or surprising interactions between features?
I found it is surprising to see borrower who is income is above $100,000 appears to be linked to high levels of delinquent. Before I think they have better capital strength that would not easily to have any delinquency.
Customer Quality
This plot shows the customer quality from two aspects: loan original amount and income range. Loan provider is interested in how much loan amount they can offer to customer and bacially this customer is in which income range.
From this plot, we can see $1-24,999 income group is more likely to lend loan around $4300; $25,000-49,999 income group is more likely to lend loan around $6100; $50,000-74,999 income group is more likely to lend loan around $8800; $75,000-99,999 income group is more likely to lend loan around $11400; $100,000+ income group is more likely to lend loan around $13000.
Loan Provider Profitability
This time I want to know how much loan provider can earn from lending loans. So I checked average interest and fees from 2006 Q1 to 2014 Q1. What I found is: interest and fees slightly increased from the end of 2006 to 2007. It shows it has been affected by the subprime mortgage crisis from 2008 to 2009 and had a significantly drop.
But what is exciting is that from 2010 to 2011, the interest and fees kept growing, but had a huge jump from 2012 to 2014. The decrease in loan interest rates in 2012 was affected by the bank start to provide more loan to borrowers. Loan providers lost a lot of their financial market in US.
Based on this diagram, we can find the loan provider profitability heavily affected by the financial market. This is also important for loan providers to measure their profitability.
## # A tibble: 50 x 4
## ProsperRating..Alpha. EstimatedLoss mean_CurrentDelinquencies n
## <fctr> <dbl> <dbl> <int>
## 1 A 0.0200 0.23536036 888
## 2 A 0.0210 0.06650446 1233
## 3 A 0.0224 0.08321580 709
## 4 A 0.0249 0.16837482 1366
## 5 A 0.0260 0.19602978 403
## 6 A 0.0274 0.11344137 1049
## 7 A 0.0299 0.13573620 1304
## 8 A 0.0324 0.16180150 1199
## 9 A 0.0325 0.09433962 106
## 10 A 0.0330 0.13391557 687
## # ... with 40 more rows
Credit Risk
The estimated loss of AA prosper rating has the lowest delinquencies. The distribution of prosper rating B, C, E and HR is scattered distribution. The prosper rating HR has the highest estimated loss above 0.3 and the highest average current delinquencies over 24 times. This plot can show the credit risk and profit loss regarding different prosper rating of borrower. This distribution is rational distributed. ——
In this report, I am glad to find the relationships between prosper rating and interest rate, delinquency and income range, delinquency and debt to income ratio, income range and loan amount.
During this EDA project, I struggled with choice of plot type, variables, and aesthetic parameters (e.g. bin width, color, axis breaks),and I tried so hard to make each plot appropriately display. I also considered how to avoid overploting and how to make sure the axis label not be cut off. Specially, in bivariate and multivariate analysis section, I need to keep follow my logic and pick the right variable to show ‘Customer Quality’, ‘Loan Provider Profitability’, and ‘Credit Risk’ outcomes.
What I successfully achieve, it’s to understand which type of plot goes with univariate plot and which goes with bivariate plot. I successfully showed the borrower basic profile, displayed the interest rate, loan amount across state and employment status. Further I successfully demonstrated the relationship between prosper rating adn current delinquency records.
In the future, I am keen to explore how to set a certain interest rate on different quality customers.